Document weight Query weight Top ten Scheme name

نویسندگان

  • Gerard Salton
  • Christopher Buckley
چکیده

the goal in information retrieval is to enable users to automatically and accurately retrieve data relevant to their queries. One possible approach to this problem is to use the vector space model, which models documents and queries as vectors in the term space. The components of the vectors are determined by the term weighting scheme. This paper compared between a selected set from the available term weighting schemes to determine which weighting method is the best one to be used with Arabic data collections. Our results shows that the best method is the probabilistic inverse (IDFP) method; and we recommend using it as a global weighting method for Arabic data collections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Improving Pseudo-Relevance Feedback using an Absorbing Document

Pseudo-Relevance Feedback assumes that the top-ranked k documents of the initial retrieval are relevant, and then terms of these documents are used to re-weight the terms of the initial query (add new terms and/or change the weights of existing terms in the query). In this paper, we propose a new approach for query expansion for ad hoc search, by using an absorbing document which is the cross p...

متن کامل

Pseudo-Relevance Feedback Method based on the Cross Product of Irrelevant Documents

Pseudo-Relevance Feedback assumes that the top-ranked k documents of the initial retrieval are relevant, and then terms of these documents are used to re-weight the terms of the initial query (add new terms and/or change the weights of existing terms in the query). In this paper, we propose a new approach for query expansion for ad hoc search, by using an absorbing document which is the cross p...

متن کامل

Effective Structured Query Formulation for Session Search

In this work, we emphasize on formulating effective structured queries for session search. For a given query, phrase-like text nuggets are identified and formulated into Lemur queries to feed into the Lemur search engine. Nuggets are substrings in qn, similar to phrases but not necessarily as semantically coherent as phrases. We assume that a valid nugget appears frequently in top returned snip...

متن کامل

SJTU at TREC 2004: Web Track Experiments

Yiming Lu, Jian Hu, Fanyuan Ma ( Department of Computer Science & Engineering , S hanghai Jiaotong University , S hanghai 200030) {luyiniao , hujian , ma-fy}@sjtu.edu.cn Abstract: This is the first year our lab to participate in Trec. We participate in Mixed-Query task for the Web track. All the runs we submitted are based on the modified Okapi weighting scheme. Besides, we used several heurist...

متن کامل

Document Re-ordering Based on Key Terms in Top Retrieved Documents

In this paper, we propose a method to improve the precision of top retrieved documents by re-ordering the retrieved documents in the initial retrieval. To re-order the documents, we first automatically extract key terms from top N (N<=30) retrieved documents, then we collect key terms that occur in query and their document frequencies in top N retrieved documents, finally we use these collected...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010